Model Selection

Billion-parameter ViT

# Billion-parameter ViT

Sapiens Pose 1b

Pose-Sapiens-1B is a high-resolution human pose estimation model based on the Vision Transformer architecture, pre-trained on 300 million 1024x1024 resolution human images, supporting 308 keypoint detections (body, face, hands, and feet).

Pose Estimation English

Sapiens Seg Foreground 1b Torchscript

Sapiens is a vision transformer model pre-trained on 300 million high-resolution human images, specifically designed for foreground person segmentation tasks.

Image Segmentation English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase